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BACKGROUND OF THE INVENTION 

Technical Field 

The present invention relates to the field of speech recognition, and in particular, 
to reducing the available speech elements within a speech grammar during a dialog. 

Description of the Related Art 

In speech recognition systems such as ViaVoice®, speech recognition can be 
performed by receiving a user spoken utterance through an input device such as a 
microphone or a headset. The received user spoken utterance can be analyzed and 
converted into speech elements. The analyzed speech elements and speech elements 
accumulated in a database can be compared. Thus characters and words that 
correspond to the entered speech elements can be extracted. Notably, the speech 
elements accumulated in the database need not be individually or independently 
stored, but rather can be stored relating to a grammar which follows particular kinds of 
rules. For example, in the case of recognizing a four-digit number as shown in Fig. 
9(a), four digits of <num1> are defined as <digits> wherein a predetermination has 
been made that Arabic numbers from 0 to 9 can be entered. Under this grammatical 
definition, a speech elements expression table can defined as shown in Fig. 9(b). 
Specifically, "0" can correspond to the four speech elements of "ree", "ree:", "rei", and 
"zero". Similarly, "1" can correspond to "ichi", a number "2" can correspond to three 
speech elements, "3" to one speech element, "4" to four speech elements, etc. Fig. 
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9(c) shows an example where the grammar of Fig. 9(a) has been applied to the speech 

elements expression of Fig. 9(b). The grammar and the speech elements expression of 

Fig. 9(c) can be used as practical base forms. 

If received speech corresponding to <digits> is "zeroichiniksan", the speech can 
5 be analyzed into speech elements wherein "zero", "ichi", "nii:" and "sa_n" can be 

obtained. In that case, the numbers "0", "1", "2", and "3" corresponding to each speech 

element can be obtained from the speech elements correspondence table. Each 

number can be applied to the grammatical definition such that the four characters 

"0123" can be obtained as a recognition result for <digits>. 
10 In speech recognition systems such as ViaVoice®, a method for improving 

O recognition accuracy called enrollment can be adopted. Enrollment can detect 
iji individual differences of received speech and study acoustic characteristics that fit each 

individual. When the reading of numbers in the Japanese language is considered, 
H however, speech recognition accuracy of such numbers is not always high. 
||j Several possible factors can be identified, each of which can decrease speech 

^ recognition system accuracy. One factor can be that the Japanese words for numbers 
X such as "ichi", "ni" and "san" are generally short and have less sound prolixity. There 
Jij can be little difference among speech elements of a short word. Thus, 
P misunderstanding of speech elements can easily occur during speech recognition. 
20 Other Japanese words for numbers can be comprised of one syllable such as "ni", 

"shi", "go" and "ku". The decreased sound prolixity for these words can be even more 

conspicuous. 

Another factor can be that some Japanese words for numbers can be 
represented by a plurality of readings, speech elements, or pronunciations. For 
25 example, readings such as "zero", "rei" and "mam" can correspond to a number "0"; 
"shi" and "yon" to "4"; "nana" and "shichi" to "7"; and "kyuu:" and "ku" to "9". When a 
plurality of readings correspond to a single number, the number of speech element 
candidates to be recognized is increased. This can cause a higher probability of 
erroneous speech recognition. 
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Another factor can be that similar speech elements exist in different numbers. 
For example, the speech elements of "shichi" (7), "ichi" (1 ) and "hachi" (8) are similar to 
one another, as are the speech elements "shi" (4) and "shichi" (7). Additionally, the 
speech elements of "ni" (2) and "shi" (4) are similar, as well as those of "san" (3) and 
"yon" (4). Discrimination between such similar speech elements can be difficult due to 
the similarity of sound. As a result, erroneous recognition can become more probable. 
The problem can become more conspicuous where speech recognition is performed 
over a telephone line and the like where the available channel bandwidth is limited. For 
example, discriminating speech having the vowel "i" which requires recognition of a low 
frequency component can become more difficult with a limited bandwidth. 

Other factors can include the pronunciation of words having one syllable with a 
long vowel wherein the long vowel is not necessarily included or pronounced in every 
situation. In that case, discrimination of such syllables can be difficult. Pronunciations 
such as "ni", "nii:", "nii:nii:" and "go", "goo:", "goo:goo:" are examples. Particularly, the 
character "5" which is usually pronounced "goo:" can be pronounced as "shichigosan" 
in the case of "753" and also can be pronounced "sangoppa" in the case of "358". 
"Goo;" further can be pronounced as "go" or "go" with a very short vowel and a plosive, 
which further can complicate the problem. 

Speech recognition of numbers via telephone and the like, is commonly used in 
various business applications. Examples can include entering member numbers, goods 
selection numbers, etc. Consequently, there can be significant benefits to the 
improvement of speech recognition of numbers, especially with regard to the 
development of business applications. 

It should be appreciated that enrollment can improve speech recognition 
accuracy to a certain extent by matching acoustic characteristics to individuals. Further 
improvement of speech recognition accuracy, however, can be limited in the case 
where received speech elements are similar to each other and the speech elements do 
not have prolixity as described above. 
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SUMMARY OF THE INVENTION 

One object of the present invention can be to improve speech recognition 
accuracy, especially with regard to improving recognition accuracy of characters, words 
and the like which can correspond to a plurality of readings. 

Another object of the invention can be to improve speech recognition accuracy 
for a sound having less prolixity such as numbers in the Japanese language and 
characters wherein a similar sound can correspond to different characters or words. 

Another object of the invention can be to improve speech recognition accuracy in 
the case where pronunciation of a one syllable character with a long vowel can be 
changed into syllables with a short vowels and repeated. 

The present invention makes good use of the knowledge of the present inventors 
that the same person can maintain the same reading consistently in one conversation. 
In other words, a person who pronounced "7" "shichi" has a tendency to keep 
pronouncing "shichi" consistently during the conversation. Making good use of this 
tendency, the present invention removes a speech element array corresponding to a 
reading that the person did not use in the first response in the conversation, or lowers a 
recognition probability for the reading, which can be applied in recognizing subsequent 
responses. 

Therefore, a speech recognition system of the present invention can have 
correspondence information in which correspondence between a recognized word and 
a speech element array for expressing pronunciation of the recognized word can be 
stored, and recognizes one or more recognized words from an entered speech input by 
comparing a speech element array generated from entered speech with the speech 
element array in the correspondence information. In the case where a recognized word 
corresponding to the speech element array is recognized in an already performed 
recognition process corresponds to a plurality of speech element arrays, a 
pronunciation prediction probability of at least one speech element array which is 
different from the recognized speech element array among the plurality of speech 
element arrays can be lowered. 
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A speech recognition method of the present invention in a conversation of the 
same person in a certain period of time, can include the steps of: receiving a first 
speech input and generating a speech element array from the first speech input; 
searching correspondence information in which the speaking prediction probability of 
the speech element correspondence between recognized words and speech element 
arrays expressing pronunciation of the recognized words can be stored; and generating 
one or more recognized words through comparison between the speech element array 
generated by the first speech input and the speech element arrays in the 
correspondence information. The method further can include lowering a pronunciation 
prediction probability of at least one speech element array which differs from the 
recognized speech element array among the plurality of speech element arrays. In the 
case where a recognized word which corresponds to a recognized speech element 
array is made to correspond to a plurality of speech element arrays and is stored, the 
method can include receiving a second speech input; generating a speech element 
array from the second speech input; and searching correspondence information in 
which the pronunciation prediction probability of the speech element array is lowered. 
Also, the method can include generating one or more recognized words through 
comparison between the speech element array generated by the second speech and 
the speech element arrays in the correspondence information. Thus, a probability of 
erroneous recognition from the second time on is lowered to improve the recognition 
accuracy. 

Characters, phrases and words can be included in the recognized words, and 
grammar information by which the recognized words can be arrayed in a specified rule 
can be included in the correspondence information. In addition, the recognized words 
can be numbers, numerals or words expressing numbers in the Japanese or English 
language. The present invention can be particularly useful in recognizing numbers. 

In the case where a recognized word corresponding to a recognized speech 
element array is made to correspond to a plurality of speech element arrays and is 
stored, a function that lowers the pronunciation prediction probability for at least one 
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speech element array which can be different from the recognized speech element array 
among the plurality of speech element arrays and a function that removes a speech 
element array which is different from the recognized speech element array to zero the 
pronunciation prediction probability can be included. Moreover, with regard to different 
speech element arrays that express pronunciation for the same recognized word, the 
speech element arrays having a number that suits a previously measured pronunciation 
prediction probability and the recognized word corresponding thereto can be included in 
the correspondence information. This further can increase the recognition accuracy. 

The certain period of time can be a period of time in one continued conversation 
or a period of time that includes a plurality of conversations in one day. The 
pronunciation tendency of a speaker is highly apt to be maintained for a short period of 
time such as a unit of one day, especially during the same conversation. On the 
contrary, after time has passed, the pronunciation tendency of the speaker can change. 
In such a case, according to the present invention, the information to be used for 
recognition can be returned to the initial state after the certain period of time has 
passed, without maintaining the pronunciation tendency of the same speaker for a long 
time. This operation is possible based on the effect that the present invention adopts a 
temporary studying technique for one conversation, for example. 

Note that, in the case where the conversation in the certain period of time is not 
continuous, a method of specifying a speaker by analyzing a password, a member 
number, an originating side telephone number or speech, or a method of specifying a 
speaker by combining these can be used. 

In the case where at least a part of one or more recognized words is referred to a 
speaker and it is judged whether an error exists in recognizing the one or more 
recognized words, if the error is recognized, the one or more recognized words can be 
replaced by one or more recognized words that are easily recognized erroneously. By 
adopting such means or a method, the recognition accuracy of a continuous 
pronunciation for a long number having a check digit, for example can be improved. 

In addition, in the case where a number of the one or more recognized words 
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which are recognized does not conform to the number of words that was previously 
registered in the recognition system, a recognized word that corresponds to a speech 
element having a syllable of a long vowel among the one or more recognized words 
which are recognized can be replaced by a repetition of a recognized word that 
corresponds to a short vowel speech element corresponding to the long vowel. 
Alternatively, a repetition of a recognized word that corresponds to a speech element 
having a syllable of a short vowel among the recognized one or more words can be 
replaced by a recognized word that corresponds to a syllable of a long vowel 
corresponding to the short vowel. By adopting such means or a method, the 
recognition accuracy can be improved for the case where a word expressed in a 
syllable with a long vowel is repeated in a form with short vowels, or a word expressed 
in a repetition of a short vowel is recognized erroneously as a word of a syllable with a 
long vowel. 

Another aspect of the invention can include a speech recognition system 
including correspondence information. The correspondence information can be for 
storing a correspondence between recognized words and a plurality of speech element 
arrays for expressing pronunciation of the recognized words. The speech recognition 
system can recognize a recognizable word from a received user spoken utterance by 
comparing a speech element array generated from the user spoken utterance with the 
plurality of speech element arrays in the correspondence information. In a dialog of a 
single person occurring within a certain period of time, the generated speech element 
array can correspond to one of the plurality of speech element arrays. A pronunciation 
prediction probability corresponding to one of the plurality of speech element arrays can 
be lowered. The pronunciation prediction probability can be different from the 
generated speech element array. 

The different speech element arrays expressing pronunciation for a single 
recognized word can include a number corresponding to a previously measured 
pronunciation prediction probability and a recognized word corresponding to the 
previously measured pronunciation prediction probability. Programming means for 
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detecting erroneously recognized words by referring a speaker to at least a part of the 
recognized word and programming means for replacing one of the erroneously 
recognized words with a recognizable word which can be recognized as one of the 
erroneously recognized words also can be included. 

The speech recognition system further can include programming means for 
replacing a recognized word which corresponds to a speech element comprising one 
syllable with a long vowel with a previously recognized word comprising one syllable 
with a short vowel corresponding to the long vowel, when a number of recognized 
words do not conform to a previously registered number in the speech recognition 
system. Programming means for replacing a recognized word corresponding to a 
speech element having one syllable with a short vowel with another previously 
recognized word corresponding to one syllable with a long vowel, wherein the long 
vowel can correspond to the short vowel also can be included. 

Another aspect of the invention can include a speech recognition method for use 
within a dialog of a single person, wherein the dialog can occur in a certain period of 
time. The method can include receiving a first user spoken utterance and generating a 
first speech element array from the first user spoken utterance; searching 
correspondence information wherein the correspondence information can associate 
recognizable words with a plurality of speech element arrays expressing pronunciation 
of the recognizable words; generating a first recognized word by comparing the first 
speech element array and the plurality of speech element arrays in the correspondence 
information; and lowering a pronunciation prediction probability of one of the plurality of 
speech element arrays which differs from the first speech element array, wherein one of 
the plurality of speech element arrays can be made to correspond to the first speech 
element array. 

The method further can include receiving a second user spoken utterance and 
generating a second speech element array from the second user spoken utterance; 
searching the correspondence information comprising the lowered pronunciation 
prediction probability; and generating a second recognized word by comparing the 
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second speech element array and the plurality of speech element arrays in the 
correspondence information. The correspondence information can include one of the 
plurality of speech element arrays having a number corresponding to a measured 
pronunciation prediction probability corresponding to one of the recognizable words. 

The method further can include determining one of the recognized words to be 
erroneous by referring a speaker to at least part of one of the recognized words, and 
replacing the erroneous word with a different recognizable word. The different 
recognizable word can be erroneously recognized as the erroneous word. The method 
can include replacing one of the recognized words corresponding to a speech element 
comprising one syllable with a long vowel with a previously recognized word comprising 
one syllable with a short vowel corresponding to said long vowel wherein a number of 
the generated words do not conform to a previously registered number in the speech 
recognition system. Finally, the method can include replacing the previously recognized 
word corresponding to a speech element comprising one syllable with a short vowel 
with another previously recognized word corresponding to one syllable with a long 
vowel, wherein the long vowel can correspond to the short vowel. 



P1010374;1 



10 



Docket No. 6169-237 

BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete understanding of the present invention and the advantages 
thereof, reference is now made to the following description taken in conjunction with the 
accompanying drawings. 

Fig. 1 is a block diagram showing an exemplary outline of a speech recognition 
system for use with the present invention. 

Fig. 2 is a table showing an example of a speech elements expression table 
which can be used with the system of Fig. 1 . 

Fig. 3 is a flowchart of an exemplary speech recognition method illustrating an 
aspect of the invention. 

Fig. 4 is a list showing exemplary reduced grammar data. 

Fig. 5 is a list showing further exemplary reduced grammar data. 

Fig. 6 is a flowchart showing another exemplary speech recognition method 
illustrating a further aspect of the invention. 

Fig. 7(a) is a flowchart showing an exemplary speech recognition method 
illustrating another aspect of the invention. 

Fig. 7(b) is exemplary grammar data for use with the present invention. 

Fig. 8 is a list showing exemplary grammar data to which un-uniformed 
probability distribution has been applied. 

Fig. 9(a) is an exemplary list showing a grammar for recognizing a four digit 

string. 

Fig. 9(b) is a table showing exemplary, practical speech elements expressions. 
Fig. 9(c) is a list showing an example of grammar data to which the speech 
elements expression of Fig. 9(b) can be applied. 
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DETAILED DESCRIPTION OF THE INVENTION 
Embodiments of the present invention will be described in detail with reference to 
the accompanying drawings below. Note that the present invention can be embodied in 
various other forms. Accordingly, the invention should not be limited to a strict 
5 interpretation of the description of the embodiments. It should be understood that the 
same numerals should refer to the same elements throughout the detailed description. 

A method and a system for speech recognition will be mainly described in the 
following embodiments, but as it is clear to those skilled in the art, the present invention 
also can be embodied as a medium in which computer usable program codes are 
10 stored as well as the method and the system. Therefore, the present invention can be 
realized within embodiments such as hardware, software, and a combination of 
software and hardware. As a medium in which program codes are stored, an optional 
3 computer readable medium that includes a hard disc, a CD-ROM, an optical storage 
m device or a magnetic storage device can be exemplified. 

jfg A computer system that can be used with the present invention can include a 

I™ central processing unit (CPU), a main memory (RAM: Random Access Memory), a 
O nonvolatile storage device (ROM: Read Only Memory) and the like, which are mutually 

connected by buses. In addition, coprocessors, an image accelerator, a cache 
)i memory, an input/output control device (I/O) and the like can be connected to the 
ie buses. An external storage device, a data input device, a display device, a 

communication control device and the like may be connected to the buses via a 
suitable interface. It should be appreciated that the computer system can include 
hardware resources typically equipped with a computer system besides the 
above-described components. A hard disc can be a typical external storage device, but 
25 the invention is not so limited to this, whereas semiconductor storage devices such as a 
magneto-optical storage device, an optical storage device, a flash memory, and the like 
can be included. Note that a read only storage device such as a CD-ROM, which can 
be used for reading a program, can be included in the external storage device in the 
case when it is used only for reading data or a program. The data input device can 
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include an input device such as a keyboard and a pointing device such as a mouse. 
The data input device also includes a speech input device. A CRT, a liquid crystal 
display device, and a plasma display device are typical display devices. A computer 
system which can be used in conjunction with the inventive arrangements disclosed 
herein can include various kinds of computers such as a personal computer, a work 
station, and a mainframe computer. Computer programs for use with the present 
invention and which can be included with the computer system can be realized in a 
centralized fashion in one computer system, or in a distributed fashion where different 
elements are spread across several interconnected computer systems. In that case, 
the program can be referred to by an address such as a DNS, URL, or the like. Any 
kind of computer system or other apparatus adapted for carrying out the methods 
described herein is suited. 

Fig. 1 is a block diagram showing an exemplary outline of a speech recognition 
system which can be used with the invention disclosed herein. The speech recognition 
system can include a voice or speech recognition engine 2 where speech or user 
spoken utterances of a user 1 can be received. Grammar data (grammar) 3 to be 
applied to the speech recognition engine 2 and a voice or speech elements expression 
table 4 to be applied to the grammar data 3 also can be included. 

A speech signal of the user 1 can be converted to an electric signal by an input 
device, for example, a microphone or a headset. AID (analog/digital) conversion can be 
performed. The signal can be converted to wave-form data that is expressed as digital 
data. The wave-form data can be analyzed or converted into speech elements and 
compared with the grammar data 3 by the speech recognition engine 2. Accordingly, 
speech recognition can be performed in this manner. 

In the speech recognition engine 2, the grammar data 3 which suits the inputted 
speech elements most satisfactorily can be selected. The speech elements expression 
table 4 can be applied to the grammar data, and a large amount of grammar data 3 (in 
which speech elements based on the speech elements expression table 4 are 
arranged) can be prepared in the form of a grammar such that entered speech may 
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follow or speech elements may be pronounced. The database of the grammar data 3 
can be referred to, and the grammar data 3 that suits the inputted speech elements can 
be selected. 

The grammar used in the embodiment can be equivalent to the one shown in 
Fig. 9(a). However, the speech elements expression table used in the embodiment can 
be different from the one of Fig. 9(b). 

Fig. 2 is a table showing an example of the speech elements expression table 
used in the present invention. Conventionally, for example in the case of "0", a 
character "0" and four speech elements ("ree", "ree:", "rei", "zero") can corresponded 
with each other. In one aspect of the invention, two kinds of readings for "0" can be 
considered, that is, "rei" and "zero". Reading information can be added to each speech 
element ("rei" and "zero" for "0"). Note that a reading of "maru" can be considered for 
"0", but for illustration purposes only two kinds of readings are exemplified here. 

Other readings can include: "shi" and "yon" for "4"; "shichi" and "nana" for "7"; 
and "ku" and "kyuu" for "9". Notably, a plurality of readings for "0", "4", "7" and "9" can 
be considered as described above, but these readings are only examples and it is a 
matter of course that other readings can be added to the speech elements expression 
table wherein those other readings also can be considered. If a plurality of readings are 
considered for a number other than the above, such a plurality of readings naturally can 
be added. In addition, though numbers are exemplified in the embodiment, Chinese 
characters (kanji), alphabets and other characters can be applied. If a plurality of 
candidates are considered for a character, a speech elements expression table can be 
made wherein the plurality of readings can be included. 

An exemplary method of the speech recognition will be described with reference 
to the grammar and the speech elements expression below. Fig. 3 is a flowchart 
showing an example of the speech recognition method of the invention. In one 
embodiment of the invention, for example, dealing data can be received or inputted by 
speech via telephone. 

First, base grammar data can be introduced, which becomes a base for the 
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speech recognition system of the invention (step 10). The introduced base grammar 
data can be the same as that of Fig. 9(c) except that reading information can be added 
as shown in Fig. 2. 

Next, a message "Please say your customer number" can be sent to a user from 
speech pronunciation means of the system side such as an audio playback system or 
text to speech technology (step 11). Assuming that the user's pronunciation is 
"shichiyonzero", recognition can be performed with the base grammar data in response 
to the speech entered (step 12). If the system recognizes "740", a message "Is 740 
correct?" can be outputted from the system side (step 13). When the user says "Yes" 
in response thereto, a recognition result of "740" can be determined. If the recognition 
result is an error, the procedure can return to step 1 1 for entering or receiving the 
speech again. 

After the decision determining "740" for the first speech recognition, for 
illustration it can be assumed that the user has read "7" in "shichi", "4" in "yon" and "0" 
in "zero". It is possible to assume that the user maintains the same reading at least in 
the same conversation or conversations in a short period of time (for example one day). 
The assumption is based on the knowledge of the present inventors that the same 
person is apt to maintain the same reading in the same conversation. Utilizing this 
tendency, there is a small possibility that the user can pronounce "7" as "nana". 
Similarly, it can be considered that there is a small possibility of pronouncing "4" as 
"shi" and "0" as "rei". Therefore, reduced grammar data is inserted in the next step 
(step 14). 

Fig. 4 is an exemplary list showing reduced grammar data. The grammar data 
can be equivalent to a grammar specified using Backus-Naur Forms (BNF) 
expressions. In other words, with regard to a character "7", a speech element "nana" 
corresponding to "nana" can be deleted. Accordingly, the speech elements can be 
limited to "hichi" and "shichi" corresponding to "shichr (20). Similarly, regarding "4", 
speech elements "shi", "shii" and "shii:" corresponding to "shr can be deleted, and 
thus, are limited to "yo_n" corresponding to "yon" (21). With regard to "0", speech 
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elements "ree", "ree:" and "rei" corresponding to "re/" can be deleted and can be limited 
to "zero" corresponding to "zero" (22). As described above, the recognition probability 
can be further improved when speech elements of small pronunciation possibility are 
deleted and recognition is performed by using the grammar data to which the reduced 
speech elements are applied. 

The reduced grammar data can be applied and the procedure can proceed to 
the next step. The system can output a message demanding a user to provide a 
second user spoken utterance (step 15). The speech recognition system can perform 
recognition in response to a received user spoken utterance (step 16). Assuming that 
the user spoken utterance is "zero san, no ichi shichi kyu:", recognition accuracy can be 
improved for received pronunciations of "ichi" and "shichi", where discrimination which 
was originally difficult in the case where the base grammar data is used, whereas 
"shichi" is now limited to "7". Moreover, since the user pronounces "9" "kyu:", the 
speech element "ku" can be deleted. Thus the grammar further can be reduced and 
applied (step 18). 

Fig. 5 is an exemplary list showing the further reduced grammar data. The 
speech element "ku" has been deleted with regard to "9" (23). A method can be 
adopted, in which the grammar data that corresponds to the speech elements 
expression table is previously made for the number of each reading combination, and 
unnecessary grammar data from the second recognition can be deleted. According to 
an aspect of the invention, by utilizing the tendency that the reading of numbers and the 
like is consistent and the same reading pronunciation is maintained by the same 
person, recognition of a second time can be performed after speech elements with low 
reading possibilities have been deleted. Accordingly, recognition probability for the 
second time can be improved. Note that the present invention is not intended for 
learning acoustic characteristics peculiar to a speaker, rather the invention is directed to 
a temporary learning. Speech recognition can start from the first step by using the base 
grammar data in the case of another speaker or a different day in spite of the same 
speaker. Therefore, there is no need for making a database for each speaker and the 
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system of the same constitution can be applied to any speaker. As a result, the system 
need not be customized for each speaker. Thus, the present invention can be 
implemented and applied easily. 

When a check digit is included in a specific figure of a number to be entered 
such as a credit card number, a presumed error word can be exchanged through error 
correction by using the check digit. The procedure can proceed to the next step 
through the recognition result after the correction. 

Fig. 6 is a flowchart showing another exemplary speech recognition method of 
the present invention. The base grammar data can be introduced (step 30), a message 
demanding to enter a card number can be outputted (step 31), and speech recognition 
can be performed by applying the base grammar data (step 32). If the entered speech 
is pronounced "ich ni san shi go roku shichi hachi ku zero ichi ni san shi go roku", the 
probability of an erroneous recognition result wherein the entered speech is not 
recognized as "1234567890123456" is not notably low since this is the first recognition. 
In the case of a credit card, wherein the check digits are typically the last two figures 
"56", verification can be performed as to whether the check digits are correct (step 33). 
If such verification shows the result to be incorrect, a digit with a possibility of erroneous 
recognition can be exchanged (step 34) and verification can be performed again (step 
33). When verification confirms a correct result, the procedure proceeds to the next 
step and confirmation is performed for the recognition result (number) after the error 
correction (step 35). 

Numbers with a possibility of erroneous recognition can be: 1 "ichi", 7 "shichi" 
and 8 "hachi" (when "7" is pronounced "shich" or "7" is not recognized); 1 "ichi" and 8 
"hachi" (when "7" is pronounced "nana"); 4 "shi" and 7 "shichi" (when "4" is pronounced 
"shi" and "7" "shichi"); or 6 "roku" and 9 "ku" (when "9" is pronounced "ku"). The 
exchange of step 34 can be performed by mutually replacing these numbers. 

Finally, based on the recognition result determined from the first recognition and 
the above-described error correction processing, unnecessary speech elements can be 
deleted, reduced grammar data can be made, and the reduced grammar data can be 
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introduced for recognition during a second recognition (step 36). 

According to a method of the invention, an error can be corrected by using the 
check digit function. Further, grammar data can be reduced each time a speech input 
is received from a user. Thus, the total processing time can be shortened. 

When only four figures are recognized despite the fact that the grammar 
demands a five figure number, it is possible that a speech input of one syllable with a 
long vowel (for example 2 "nii:") became syllables with short vowels, and 2 "ni" was 
repeated. Fig. 7(a) is a flowchart showing another exemplary method of the invention. 
Fig. 7(b) depicts grammar data to be applied. As Fig. 7(b) indicates, a number of five 
digits is required as specified by the grammar (40). 

As shown in Fig. 7(a), the base grammar data can be introduced (step 41), a 
message demanding to enter a card number can be outputted (step 42), and 
recognition can be performed by applying the base grammar data (step 43). If the 
entered speech is pronounced "ich ni ni san shi", an erroneous recognition result of 
"1234" can be determined. In step 44, it can be determined whether five figures were 
recognized (step 44). If only four characters are recognized, it can be presumed that 
"nini" was recognized erroneously as one character "2" with one syllable with a long 
vowel. Accordingly, "2" can be replaced with "22" (step 45), and the procedure can 
proceed to a confirmation step (step 46). When the five characters are normally 
recognized, the procedure can proceed to step 46. Thereafter, based on the 
recognition result that is determined from the first recognition and the above-described 
error correction processing, unnecessary speech elements can be deleted. 
Accordingly, a reduced grammar data can be made which can be introduced for 
recognition for use during the second recognition (step 47). 

According to an exemplary method of the invention, repetition of a short vowel 
that is easily recognized erroneously as a sound having one syllable with a long vowel 
can be corrected. As a result, the number of inputs required from a user to correct an 
error can be reduced thereby obtaining improved convenience and reduced total 
processing time. In one embodiment, an example is shown wherein repetition of a 
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short vowel can be recognized erroneously as one character with a long vowel, but it 
should be appreciated that an error can be corrected in a similar manner wherein one 
character with a long vowel is recognized erroneously as repetition of a short vowel. 
Thus, the original number of characters can be erroneously recognized so as to have 
more characters than expected. 

The present invention created by the present inventor has been specifically 
described based on the embodiments. However, the present invention is not limited to 
the embodiments, and various modifications can be made within the scope of the 
present invention. For example, as shown in Fig. 8, an un-uniformed probability 
distribution can be applied to grammar. Specifically, referring to "0", it can be found 
through experience that a probability of a pronounced "zero" is larger than a probability 
of pronounced "rei". In such a case, as shown in Fig. 8, three speech elements of 
"zero" can be allocated to <0> and one <rei> can be allocated thereto. In this case, a 
pronunciation probability can be assumed wherein 75% corresponds to "zero", and 25% 
to "rei". Note that speech elements of "ree", "ree:" and "rei" can be uniformly allocated 
to pronunciation of "rei". Recognition accuracy can be further improved by applying 
such un-uniformed probability distribution. 

Moreover, the present invention also can be applied to the English language. 
For example, exemplary readings for "1-800-1122" can be: (1) "one eight hundred 
double one double two"; (2) "one eight oh oh eleven twenty two"; and (3) "one eight 
hundred one one two two". The same reading, however, can be considered to be 
maintained in one conversation or dialog. In the case of a plurality of readings for one 
expression, as in this example, recognition accuracy from the second time on can be 
improved by applying the present invention not only to Japanese but also to English. 
The same can be applied to English characters. For example, "0" has a plurality of 
readings such as "zero" and "oh". The present invention can be applied to such a case. 

In addition, English has readings for a line of numbers such as: (1) reading 
figures by dividing numbers in two figures ; (2) reading numbers continuously (solid 
reading); (3) official figure reading; and (4) expressing a continuation of the same 
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number in double-xx, triple-xx and the like. For example, "1999" has readings such as 
"nineteen ninety nine", "one nine nine nine", "one thousand nine hundred ninety nine", 
"one triple nine" and "nineteen double nine". Such a plurality of readings requires 
grammar data that corresponds to the respective readings. Although in the first 
recognition, the grammar needs to include all the readings, for subsequent recognitions, 
the grammar data that is considered to be unnecessary can be deleted by applying the 
present invention. Thus, the recognition probability from the second time on can be 
improved in the case of English as well. 

Further, the present invention is not limited to numbers and can be applied to 
regular words. For example, there are words of the same meaning, which can be 
expressed in a plurality of readings such as "ao/blu: (blue)", "sora tobu 
emban/yu:fo:/yu:efuo: (UFO)" and "iriguchi/hairikuchi (entrance)". The present 
invention can be used with regard to such words. Words which were not expressed in 
the first recognition can be deleted from the second recognition, thus the recognition 
probability from the second recognition on can be improved. 

Still, the present invention can be applied to auxiliary verbs. For example, 
expressions such as "...da" or "...desu" are commonly used. In that case, an 
expression that was not used in the first recognition can be deleted, and the recognition 
probability from the second recognition on can be improved. Another aspect of the 
present invention can be for use with verbs, adjectives, and other parts of speech in the 
case where a plurality of readings or expressions thereof exist. In each case, speech 
recognition accuracy can be improved. 

Advantages obtained by the representative items among the disclosed present 
invention can include improving speech recognition accuracy, particularly recognition 
accuracy for characters or words to which a plurality of readings are given. Moreover, 
recognition accuracy can be improved for sounds having little prolixity such as numbers 
in Japanese, characters and the like in the case where similar sounds correspond to 
different characters or words. Additionally, speech recognition accuracy can be 
improved in the case where a character of one syllable with a long vowel becomes a 
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repetition of a character with a short vowel. 

Although the preferred embodiments of the present invention have been 
described in detail, it should be understood that various changes, substitutions and 
alterations can be made therein without departing from spirit and scope of the invention 
as defined by the following claims. 
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